Logistic Regression and Boosting for Labeled Bags of Instances
نویسندگان
چکیده
In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a bag-level probability estimate by simple averaging and maximizing the likelihood at the bag level—in other words, by assuming that all instances contribute equally and independently to a bag’s label. We present empirical results for artificial data generated according to the underlying generative model that we assume, and also show that the two algorithms produce competitive results on the Musk benchmark datasets.
منابع مشابه
Reducing Annotation Effort using Generalized Expectation Criteria
Generalized expectation (GE) criteria [McCallum et al., 2007] are terms in objective functions that assign scores to values of model expectations. In this paper we introduce GE-FL, a method that uses GE to train a probabilistic model using associations between input features and classes rather than complete labeled instances. Specifically, here the expectations are model predicted class distrib...
متن کاملMultiple Instance Metric Learning from Automatically Labeled Bags of Faces
Metric learning aims at finding a distance that approximates a task-specific notion of semantic similarity. Typically, a Mahalanobis distance is learned from pairs of data labeled as being semantically similar or not. In this paper, we learn such metrics in a weakly supervised setting where “bags” of instances are labeled with “bags” of labels. We formulate the problem as a multiple instance le...
متن کاملMultiple Instance Learning with Query Bags
In many machine learning applications, precisely labeled data is either burdensome or impossible to collect. Multiple Instance Learning (MIL), in which training data is provided in the form of labeled bags rather than labeled instances, is one approach for dealing with ambiguously labeled data. In this paper we argue that in many applications of MIL (e.g. image, audio, text, bioinformatics) a s...
متن کاملReview of Multi-Instance Learning and Its applications
Multiple Instance Learning (MIL) is proposed as a variation of supervised learning for problems with incomplete knowledge about labels of training examples. In supervised learning, every training instance is assigned with a discrete or real-valued label. In comparison, in MIL the labels are only assigned to bags of instances. In the binary case, a bag is labeled positive if at least one instanc...
متن کاملBoosted Regression (Boosting): An introductory tutorial and a Stata plugin
Boosting, or boosted regression, is a recent data mining technique that has shown considerable success in predictive accuracy. This article gives an overview over boosting and introduces a new Stata command, boost, that implements the boosting algorithm described in Hastie et al. (2001, p. 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regress...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004